Goto

Collaborating Authors

 parameter initialization


Benchmarking VQE Configurations: Architectures, Initializations, and Optimizers for Silicon Ground State Energy

Boutakka, Zakaria, Innan, Nouhaila, Shafique, Muhammed, Bennai, Mohamed, Sakhi, Z.

arXiv.org Artificial Intelligence

Quantum computing presents a promising path toward precise quantum chemical simulations, particularly for systems that challenge classical methods. This work investigates the performance of the Variational Quantum Eigensolver (VQE) in estimating the ground-state energy of the silicon atom, a relatively heavy element that poses significant computational complexity. Within a hybrid quantum-classical optimization framework, we implement VQE using a range of ansatz, including Double Excitation Gates, ParticleConservingU2, UCCSD, and k-UpCCGSD, combined with various optimizers such as gradient descent, SPSA, and ADAM. The main contribution of this work lies in a systematic methodological exploration of how these configuration choices interact to influence VQE performance, establishing a structured benchmark for selecting optimal settings in quantum chemical simulations. Key findings show that parameter initialization plays a decisive role in the algorithm's stability, and that the combination of a chemically inspired ansatz with adaptive optimization yields superior convergence and precision compared to conventional approaches.


VQEzy: An Open-Source Dataset for Parameter Initialization in Variational Quantum Eigensolvers

Zhang, Chi, Zheng, Mengxin, Lou, Qian, Leung, Hui Min, Chen, Fan

arXiv.org Artificial Intelligence

Variational Quantum Eigensolvers (VQEs) are a leading class of noisy intermediate-scale quantum (NISQ) algorithms, whose performance is highly sensitive to parameter initialization. Although recent machine learning-based initialization methods have achieved state-of-the-art performance, their progress has been limited by the lack of comprehensive datasets. Existing resources are typically restricted to a single domain, contain only a few hundred instances, and lack complete coverage of Hamiltonians, ansatz circuits, and optimization trajectories. To overcome these limitations, we introduce VQEzy, the first large-scale dataset for VQE parameter initialization. VQEzy spans three major domains and seven representative tasks, comprising 12,110 instances with full VQE specifications and complete optimization trajectories. The dataset is available online, and will be continuously refined and expanded to support future research in VQE optimization.


DiffQ: Unified Parameter Initialization for Variational Quantum Algorithms via Diffusion Models

Zhang, Chi, Zheng, Mengxin, Lou, Qian, Chen, Fan

arXiv.org Artificial Intelligence

V ariational Quantum Algorithms (VQAs) [1] have emerged as leading methods for the noisy intermediate-scale quantum (NISQ) era [2]. By combining limited quantum resources with classical optimizers, they reduce reliance on fault-tolerant devices while offering resilience to noise [1], low circuit complexity [3], and design flexibility [4]. VQAs have already demonstrated success in quantum physics, chemistry, and materials science [5-7]. Despite this promise, their scalability remains a central challenge: as system size increases, optimization landscapes flatten exponentially [8], leading to vanishing gradients and poor convergence. Parameter initialization has therefore become a critical strategy [9], reshaping the landscape to enhance trainability and mitigate suboptimal convergence. Recent deep learning-based initialization methods [10-13] define the state of the art, yet they remain task-specific, depend on limited datasets, and are typically validated in narrow settings, constraining their generalizability across diverse VQA applications.


Issues with Neural Tangent Kernel Approach to Neural Networks

Liu, Haoran, Tai, Anthony, Crandall, David J., Huang, Chunfeng

arXiv.org Machine Learning

Neural tangent kernels (NTKs) have been proposed to study the behavior of trained neural networks from the perspective of Gaussian processes. An important result in this body of work is the theorem of equivalence between a trained neural network and kernel regression with the corresponding NTK. This theorem allows for an interpretation of neural networks as special cases of kernel regression. However, does this theorem of equivalence hold in practice? In this paper, we revisit the derivation of the NTK rigorously and conduct numerical experiments to evaluate this equivalence theorem. We observe that adding a layer to a neural network and the corresponding updated NTK do not yield matching changes in the predictor error. Furthermore, we observe that kernel regression with a Gaussian process kernel in the literature that does not account for neural network training produces prediction errors very close to that of kernel regression with NTKs. These observations suggest the equivalence theorem does not hold well in practice and puts into question whether neural tangent kernels adequately address the training process of neural networks.


A More Accurate Approximation of Activation Function with Few Spikes Neurons

Jeong, Dayena, Park, Jaewoo, Jo, Jeonghee, Park, Jongkil, Kim, Jaewook, Jang, Hyun Jae, Lee, Suyoun, Park, Seongsik

arXiv.org Artificial Intelligence

Objective: Recent deep neural networks (DNNs), such as diffusion models [1], have faced high computational demands. Thus, spiking neural networks (SNNs) have attracted lots of attention as energy-efficient neural networks. However, conventional spiking neurons, such as leaky integrate-and-fire neurons, cannot accurately represent complex non-linear activation functions, such as Swish [2]. To approximate activation functions with spiking neurons, few spikes (FS) neurons were proposed [3], but the approximation performance was limited due to the lack of training methods considering the neurons. Thus, we propose tendency-based parameter initialization (TBPI) to enhance the approximation of activation function with FS neurons, exploiting temporal dependencies initializing the training parameters.


Meta-Learning Neural Procedural Biases

Raymond, Christian, Chen, Qi, Xue, Bing, Zhang, Mengjie

arXiv.org Artificial Intelligence

The goal of few-shot learning is to generalize and achieve high performance on new unseen learning tasks, where each task has only a limited number of examples available. Gradient-based meta-learning attempts to address this challenging task by learning how to learn new tasks by embedding inductive biases informed by prior learning experiences into the components of the learning algorithm. In this work, we build upon prior research and propose Neural Procedural Bias Meta-Learning (NPBML), a novel framework designed to meta-learn task-adaptive procedural biases. Our approach aims to consolidate recent advancements in metalearned initializations, optimizers, and loss functions by learning them simultaneously and making them adapt to each individual task to maximize the strength of the learned inductive biases. This imbues each learning task with a unique set of procedural biases which is specifically designed and selected to attain strong learning performance in only a few gradient steps. The experimental results show that by meta-learning the procedural biases of a neural network, we can induce strong inductive biases towards a distribution of learning tasks, enabling robust learning performance across many well-established few-shot learning benchmarks. Humans have an exceptional ability to learn new tasks from only a few examples instances. We can often quickly adapt to new domains effectively by building upon and utilizing past experiences of related tasks, leveraging only a small amount of information about the target domain.


Cyclic Sparse Training: Is it Enough?

Gadhikar, Advait, Nelaturu, Sree Harsha, Burkholz, Rebekka

arXiv.org Artificial Intelligence

The success of iterative pruning methods in achieving state-of-the-art sparse networks has largely been attributed to improved mask identification and an implicit regularization induced by pruning. We challenge this hypothesis and instead posit that their repeated cyclic training schedules enable improved optimization. To verify this, we show that pruning at initialization is significantly boosted by repeated cyclic training, even outperforming standard iterative pruning methods. The dominant mechanism how this is achieved, as we conjecture, can be attributed to a better exploration of the loss landscape leading to a lower training loss. However, at high sparsity, repeated cyclic training alone is not enough for competitive performance. A strong coupling between learnt parameter initialization and mask seems to be required. Standard methods obtain this coupling via expensive pruning-training iterations, starting from a dense network. To achieve this with sparse training instead, we propose SCULPT-ing, i.e., repeated cyclic training of any sparse mask followed by a single pruning step to couple the parameters and the mask, which is able to match the performance of state-of-the-art iterative pruning methods in the high sparsity regime at reduced computational cost.


Uncertainty Distribution Assessment of Jiles-Atherton Parameter Estimation for Inrush Current Studies

Ugarte-Valdivielso, Jone, Aizpurua, Jose I., Barrenetxea-Iñarra, Manex

arXiv.org Artificial Intelligence

Transformers are one of the key assets in AC distribution grids and renewable power integration. During transformer energization inrush currents appear, which lead to transformer degradation and can cause grid instability events. These inrush currents are a consequence of the transformer's magnetic core saturation during its connection to the grid. Transformer cores are normally modelled by the Jiles-Atherton (JA) model which contains five parameters. These parameters can be estimated by metaheuristic-based search algorithms. The parameter initialization of these algorithms plays an important role in the algorithm convergence. The most popular strategy used for JA parameter initialization is a random uniform distribution. However, techniques such as parameter initialization by Probability Density Functions (PDFs) have shown to improve accuracy over random methods. In this context, this research work presents a framework to assess the impact of different parameter initialization strategies on the performance of the JA parameter estimation for inrush current studies. Depending on available data and expert knowledge, uncertainty levels are modelled with different PDFs. Moreover, three different metaheuristic-search algorithms are employed on two different core materials and their accuracy and computational time are compared. Results show an improvement in the accuracy and computational time of the metaheuristic-based algorithms when PDF parameter initialization is used.


When does MAML Work the Best? An Empirical Study on Model-Agnostic Meta-Learning in NLP Applications

Liu, Zequn, Zhang, Ruiyi, Song, Yiping, Ju, Wei, Zhang, Ming

arXiv.org Artificial Intelligence

Model-Agnostic Meta-Learning (MAML), a model-agnostic meta-learning method, is successfully employed in NLP applications including few-shot text classification and multi-domain low-resource language generation. Many impacting factors, including data quantity, similarity among tasks, and the balance between general language model and task-specific adaptation, can affect the performance of MAML in NLP, but few works have thoroughly studied them. In this paper, we conduct an empirical study to investigate these impacting factors and conclude when MAML works the best based on the experimental results.


On the Learning Dynamics of Attention Networks

Vashisht, Rahul, Ramaswamy, Harish G.

arXiv.org Artificial Intelligence

Attention models are typically learned by optimizing one of three standard loss functions that are variously called -- soft attention, hard attention, and latent variable marginal likelihood (LVML) attention. All three paradigms are motivated by the same goal of finding two models -- a `focus' model that `selects' the right \textit{segment} of the input and a `classification' model that processes the selected segment into the target label. However, they differ significantly in the way the selected segments are aggregated, resulting in distinct dynamics and final results. We observe a unique signature of models learned using these paradigms and explain this as a consequence of the evolution of the classification model under gradient descent when the focus model is fixed. We also analyze these paradigms in a simple setting and derive closed-form expressions for the parameter trajectory under gradient flow. With the soft attention loss, the focus model improves quickly at initialization and splutters later on. On the other hand, hard attention loss behaves in the opposite fashion. Based on our observations, we propose a simple hybrid approach that combines the advantages of the different loss functions and demonstrates it on a collection of semi-synthetic and real-world datasets